A Variation on the Boyer-Moore Algorithm

نویسنده

Thierry Lecroq

چکیده

String-matching consists in finding all the occurrences of a word w in a text t. Several algorithms have been found for solving this problem. They are presented by Aho in a recent book [l]. Among these algorithms, the Boyer-Moore approach [S, 1 l] seems to lead to the fastest algorithms for the search phase. Even if the original version of the Bayer-Moore algorithm has a quadratic worst case, its behavior in practice seems to be sublinear. Furthermore, other authors [9,2] have improved this worst-case time complexity for the search phase so that it becomes linear in the length of the text. The best bound for the number of letter comparisons is due to Apostolico and Giancarlo [2] and is 2n-m+ 1, where n is the length of the text and m the length of the word. Another particularity of the Boyer-Moore algorithm is that the study of its complexity is not obvious; see [lo, 73. Basically, the Boyer-Moore algorithm tries to find for a given position in the text the longest suffix of the word which ends at that position. A new approach can possess the ability for a given position in the text to compute the length of the longest prefix of the word which ends at that position. When we know this length, we are able to compute a better shift than the Boyer-Moore approach. In the first version we make a new attempt at matching, forgetting all the previous prefixes matched. This leads to a very simple algorithm but it has a quadratic worst-case running time. In an improved version we memorize the position where the previous longest prefix found ends and we make a new attempt at matching only the number of characters corresponding to the complement of this prefix. We are then able to compute a shift without reading again backwards more than half the characters of the prefix found in the previous attempt. This leads to a linear-time algorithm which scans the text characters at most three times each.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On obtaining the Boyer-Moore string-matching algorithm by partial evaluation

We present the first derivation of the search phase of the Boyer-Moore stringmatching algorithm by partial evaluation of an inefficient string matcher. The derivation hinges on identifying the bad-character-shift heuristic as a bindingtime improvement, bounded static variation. An inefficient string matcher incorporating this binding-time improvement specializes into the search phase of the Hor...

متن کامل

Forward - Fast - Sear h : Another Fast Variant of the Boyer - Moore String

Abstra t. We present a variation of the Fast-Sear h string mat hing algorithm, a re ent member of the large family of Boyer-Moore-like algorithms, and we ompare it with some of the most e e tive string mat hing algorithms, su h as Horspool, Qui k Sear h, Tuned Boyer-Moore, Reverse Fa tor, Berry-Ravindran, and Fast-Sear h itself. All algorithms are ompared in terms of run-time e ien y, number of...

متن کامل

Forward - Fast - Sear h : Another Fast Variant of the Boyer - Moore String Mat

متن کامل

A Space-Efficient Implementation of the Good-Suffix Heuristic

We present an efficient variation of the good-suffix heuristic, firstly introduced in the well-known Boyer-Moore algorithm for the exact string matching problem. Our proposed variant uses only constant space, retaining much the same time efficiency of the original rule, as shown by extensive experimentation.

متن کامل

Enhanced Pattern Matching Performance Using Improved Boyer Moore Horspool Algorithm

In computer science, the Boyer–Moore–Horspool algorithm is an algorithm for finding substrings in strings. A pattern matching problem can be classified into software and hardware based on implemental methods. It is important of enhance pattern matching performance. This paper proposes enhanced pattern matching performance using improved Boyer Moore Horspool Algorithm. It combines the determinis...

متن کامل

String Matching Rules Used by Variants of Boyer-moore Algorithm

String matching problem is widely studied problem in computer science, mainly due to its large applications used in various fields. In this regards many string matching algorithms have been proposed. Boyer-Moore is most popular algorithm. Hence, maximum variants are proposed from Boyer-Moore (BM) algorithm. This paper addresses the variant of Boyer-Moore algorithm for finding the occurrences of...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Theor. Comput. Sci.

دوره 92 شماره

صفحات -

تاریخ انتشار 1992

A Variation on the Boyer-Moore Algorithm

نویسنده

چکیده

منابع مشابه

On obtaining the Boyer-Moore string-matching algorithm by partial evaluation

Forward - Fast - Sear h : Another Fast Variant of the Boyer - Moore String

Forward - Fast - Sear h : Another Fast Variant of the Boyer - Moore String Mat

A Space-Efficient Implementation of the Good-Suffix Heuristic

Enhanced Pattern Matching Performance Using Improved Boyer Moore Horspool Algorithm

String Matching Rules Used by Variants of Boyer-moore Algorithm

عنوان ژورنال:

اشتراک گذاری